35 research outputs found

    Revisiting the Importance of Encoding Logic Rules in Sentiment Classification

    Full text link
    We analyze the performance of different sentiment classification models on syntactically complex inputs like A-but-B sentences. The first contribution of this analysis addresses reproducible research: to meaningfully compare different models, their accuracies must be averaged over far more random seeds than what has traditionally been reported. With proper averaging in place, we notice that the distillation model described in arXiv:1603.06318v4 [cs.LG], which incorporates explicit logic rules for sentiment classification, is ineffective. In contrast, using contextualized ELMo embeddings (arXiv:1802.05365v2 [cs.CL]) instead of logic rules yields significantly better performance. Additionally, we provide analysis and visualizations that demonstrate ELMo's ability to implicitly learn logic rules. Finally, a crowdsourced analysis reveals how ELMo outperforms baseline models even on sentences with ambiguous sentiment labels.Comment: EMNLP 2018 Camera Read

    Dual Language Models for Code Switched Speech Recognition

    Full text link
    In this work, we present a simple and elegant approach to language modeling for bilingual code-switched text. Since code-switching is a blend of two or more different languages, a standard bilingual language model can be improved upon by using structures of the monolingual language models. We propose a novel technique called dual language models, which involves building two complementary monolingual language models and combining them using a probabilistic model for switching between the two. We evaluate the efficacy of our approach using a conversational Mandarin-English speech corpus. We prove the robustness of our model by showing significant improvements in perplexity measures over the standard bilingual language model without the use of any external information. Similar consistent improvements are also reflected in automatic speech recognition error rates.Comment: Accepted at Interspeech 201

    Does Confidence Reporting from the Crowd Benefit Crowdsourcing Performance?

    Full text link
    We explore the design of an effective crowdsourcing system for an MM-ary classification task. Crowd workers complete simple binary microtasks whose results are aggregated to give the final classification decision. We consider the scenario where the workers have a reject option so that they are allowed to skip microtasks when they are unable to or choose not to respond to binary microtasks. Additionally, the workers report quantized confidence levels when they are able to submit definitive answers. We present an aggregation approach using a weighted majority voting rule, where each worker's response is assigned an optimized weight to maximize crowd's classification performance. We obtain a couterintuitive result that the classification performance does not benefit from workers reporting quantized confidence. Therefore, the crowdsourcing system designer should employ the reject option without requiring confidence reporting.Comment: 6 pages, 4 figures, SocialSens 2017. arXiv admin note: text overlap with arXiv:1602.0057

    DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

    Full text link
    Disfluency correction (DC) is the process of removing disfluent elements like fillers, repetitions and corrections from spoken utterances to create readable and interpretable text. DC is a vital post-processing step applied to Automatic Speech Recognition (ASR) outputs, before subsequent processing by downstream language understanding tasks. Existing DC research has primarily focused on English due to the unavailability of large-scale open-source datasets. Towards the goal of multilingual disfluency correction, we present a high-quality human-annotated DC corpus covering four important Indo-European languages: English, Hindi, German and French. We provide extensive analysis of results of state-of-the-art DC models across all four languages obtaining F1 scores of 97.55 (English), 94.29 (Hindi), 95.89 (German) and 92.97 (French). To demonstrate the benefits of DC on downstream tasks, we show that DC leads to 5.65 points increase in BLEU scores on average when used in conjunction with a state-of-the-art Machine Translation (MT) system. We release code to run our experiments along with our annotated dataset here.Comment: Accepted at EMNLP 2023 Finding

    Accented Speech Recognition With Accent-specific Codebooks

    Full text link
    Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. In this work, we propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks. These learnable codebooks capture accent-specific information and are integrated within the ASR encoder layers. The model is trained on accented English speech, while the test data also contained accents which were not seen during training. On the Mozilla Common Voice multi-accented dataset, we show that our proposed approach yields significant performance gains not only on the seen English accents (up to 37%37\% relative improvement in word error rate) but also on the unseen accents (up to 5%5\% relative improvement in WER). Further, we illustrate benefits for a zero-shot transfer setup on the L2Artic dataset. We also compare the performance with other approaches based on accent adversarial training.Comment: Accepted to EMNLP 2023 Main Conference (Long Paper
    corecore